Clustering in the Boolean Hypercube in a List Decoding Regime

نویسندگان

  • Irit Dinur
  • Elazar Goldenberg
چکیده

We consider the following clustering with outliers problem: Given a set of points X ⊂ {−1, 1}, such that there is some point z ∈ {−1, 1} for which Prx∈X [〈x, z〉 ≥ ε] ≥ δ, nd z. We call such a point z a (δ, ε)-center of X. In this work we give lower and upper bounds for the task of nding a (δ, ε)-center. Our main upper bound shows that for values of ε and δ that are larger than 1/poly log(n), there exists a polynomial time algorithm that nds a (δ− o(1), ε− o(1))-center. Moreover, it outputs a list of centers explaining all of the clusters in the input. Our main lower bound shows that given a set for which there exists a (δ, ε)-center, it is hard to nd even a (δ/n, ε)-center for some constant c and ε = 1/poly(n), δ = 1/poly(n). ∗Weizmann Institute of Science and Radcli e Institute for Advanced Study. Research supported in part by the Israel Science Foundation grant no. 1179/09 and by the Binational Science Foundation grant no. 2008293 and by an ERC grant no. 239985. †Weizmann Institute of Science.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Improved Skips for Faster Postings List Intersection

Information retrieval can be achieved through computerized processes by generating a list of relevant responses to a query. The document processor, matching function and query analyzer are the main components of an information retrieval system. Document retrieval system is fundamentally based on: Boolean, vector-space, probabilistic, and language models. In this paper, a new methodology for mat...

متن کامل

Improved Skips for Faster Postings List Intersection

Information retrieval can be achieved through computerized processes by generating a list of relevant responses to a query. The document processor, matching function and query analyzer are the main components of an information retrieval system. Document retrieval system is fundamentally based on: Boolean, vector-space, probabilistic, and language models. In this paper, a new methodology for mat...

متن کامل

Boolean autoencoders and hypercube clustering complexity

We introduce and study the properties of Boolean autoencoder circuits. In particular, we show that the Boolean autoencoder circuit problem is equivalent to a clustering problem on the hypercube. We show that clustering m binary vectors on the n-dimensional hypercube into k clusters is NP-hard, as soon as the number of clusters scales like m (ε > 0), and thus the general Boolean autoencoder prob...

متن کامل

List-decodable zero-rate codes

We consider list-decoding in the zero-rate regime for two cases: the binary alphabet and the spherical codes in Euclidean space. Specifically, we study the maximal τ ∈ [0, 1] for which there exists an arrangement ofM balls of relative Hamming radius τ in the binary hypercube (of arbitrary dimension) with the property that no point of the latter is covered by L or more of them. As M →∞ the maxim...

متن کامل

On Optimal Erasure and List Decoding Schemes of Convolutional Codes

A modified Viterbi algorithm with erasures and list-decoding is introduced. This algorithm is shown to yield the optimal decoding rule of Forney with erasures and variable list-size. For the case of decoding with erasures, the optimal algorithm is compared to the simple algorithm of Yamamoto and Itoh. The comparison shows a remarkable similarity in simulated performance, but with a considerably...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Electronic Colloquium on Computational Complexity (ECCC)

دوره 20  شماره 

صفحات  -

تاریخ انتشار 2013